feat(updater): tier 1 — notify admin and pad users of available updates#7601
feat(updater): tier 1 — notify admin and pad users of available updates#7601JohnMcLear merged 25 commits intoether:developfrom
Conversation
|
/review |
Code Review by Qodo
1. Updater poller can get stuck
|
Review Summary by Qodo(Agentic_describe updated until commit 110bddd)feat(updater): Tier 1 — notify admin and pad users of available updates
WalkthroughsDescription• Ships **Tier 1 (notify)** of the four-tier auto-update design with periodic GitHub Releases polling (default 6h, ETag-cached) writing to var/update-state.json. • **Admin UI**: Banner on every admin page + dedicated /admin/update page displaying version, install method, tier, and changelog. • **Pad UI**: Discreet footer badge (only visible when severely outdated or running vulnerable version); public /api/version-status endpoint never leaks running version string. • **Email notifications**: Optional escalating cadence via new adminEmail setting (weekly while vulnerable, monthly while severely outdated); real SMTP deferred to follow-up PR. • **Core modules**: Pure functions for semantic version comparison, vulnerability detection, state persistence, install method detection, policy evaluation, GitHub API polling with ETag caching, and email cadence logic. • **Settings**: New updates.* block with configurable tier, source, channel, check interval, and install method override; defaults to tier: "notify". • **Comprehensive test coverage**: Vitest unit tests for all pure modules, Mocha integration tests for HTTP endpoints, Playwright tests for admin and pad UI. • **Documentation**: New doc/admin/updates.md admin guide, complete design specification, and implementation plan; CHANGELOG updated. • **No execution code in Tier 1**: Manual click (PR 2), auto with grace window (PR 3), and autonomous in maintenance window (PR 4) are designed but deferred. Diagramflowchart LR
A["GitHub Releases API"] -->|"ETag-cached polling<br/>default 6h"| B["VersionChecker"]
B -->|"latest release"| C["UpdatePolicy"]
C -->|"tier evaluation"| D["Notifier"]
D -->|"email cadence"| E["Email Queue<br/>SMTP deferred"]
C -->|"policy result"| F["Admin UI<br/>Banner + Page"]
C -->|"outdated status"| G["Public API<br/>/api/version-status"]
G -->|"no version leak"| H["Pad Badge<br/>Footer"]
B -->|"state persistence"| I["var/update-state.json"]
J["Settings<br/>updates.* block"] -->|"config"| B
J -->|"config"| C
J -->|"config"| D
File Changes1. src/node/updater/types.ts
|
Code Review by Qodo
1. updates.tier defaults to notify
|
| /** | ||
| * Self-update subsystem (PR 1: tier 1 only). | ||
| * Tier "off" disables the version check entirely. Default "notify" shows a banner when behind. | ||
| */ | ||
| updates: { | ||
| tier: 'notify', | ||
| source: 'github', | ||
| channel: 'stable', | ||
| installMethod: 'auto', | ||
| checkIntervalHours: 6, | ||
| githubRepo: 'ether/etherpad', | ||
| }, |
There was a problem hiding this comment.
1. updates.tier defaults to notify 📘 Rule violation ☼ Reliability
The new updater subsystem is enabled by default via updates.tier: "notify", so existing installs get new behavior without explicit opt-in. This violates the requirement that new features be feature-flagged and disabled by default.
Agent Prompt
## Issue description
The updater feature is enabled by default because `updates.tier` defaults to `notify`, which violates the requirement that new features be disabled by default.
## Issue Context
This PR introduces a new updater subsystem (polling + UI/API surfacing). To comply with the feature-flag rule, the default configuration must keep the feature off unless explicitly enabled.
## Fix Focus Areas
- src/node/utils/Settings.ts[427-438]
- settings.json.template[178-190]
- settings.json.docker[187-200]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| export const expressCreateServer = ( | ||
| _hookName: string, | ||
| {app}: ArgsExpressType, | ||
| cb: Function, | ||
| ): void => { | ||
| // Public endpoint. Cached for 60s. Returns only an enum — no version string. | ||
| app.get('/api/version-status', async (_req: any, res: any) => { | ||
| const now = Date.now(); | ||
| if (now - badgeCache.at > BADGE_CACHE_MS) { | ||
| badgeCache = {value: await computeOutdated(), at: now}; | ||
| } | ||
| res.json({outdated: badgeCache.value}); | ||
| }); | ||
|
|
||
| // Admin-protected. webaccess.ts already gates /admin/* with admin auth. | ||
| app.get('/admin/update/status', async (_req: any, res: any) => { | ||
| const state = await loadState(stateFilePath()); | ||
| const current = getEpVersion(); | ||
| const installMethod = getDetectedInstallMethod(); | ||
| const policy = state.latest | ||
| ? evaluatePolicy({installMethod, tier: settings.updates.tier, current, latest: state.latest.version}) | ||
| : null; | ||
| res.json({ | ||
| currentVersion: current, | ||
| latest: state.latest, | ||
| lastCheckAt: state.lastCheckAt, | ||
| installMethod, | ||
| tier: settings.updates.tier, | ||
| policy, | ||
| vulnerableBelow: state.vulnerableBelow, | ||
| }); | ||
| }); |
There was a problem hiding this comment.
2. Update status routes ignore off 📘 Rule violation ☼ Reliability
The new /api/version-status and /admin/update/status routes are always registered, so behavior changes even if updates.tier is set to off. This violates the requirement that disabling the feature flag preserves prior behavior.
Agent Prompt
## Issue description
`/api/version-status` and `/admin/update/status` are registered regardless of `settings.updates.tier`, so the updater feature cannot be fully disabled.
## Issue Context
Even if `updates.tier` is configured to `off`, the server still exposes new endpoints (a behavior change vs pre-PR). To meet the feature-flag requirement, the route-registration hook should no-op when `tier === 'off'`.
## Fix Focus Areas
- src/node/hooks/express/updateStatus.ts[29-60]
- src/ep.json[105-118]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| // Admin-protected. webaccess.ts already gates /admin/* with admin auth. | ||
| app.get('/admin/update/status', async (_req: any, res: any) => { | ||
| const state = await loadState(stateFilePath()); | ||
| const current = getEpVersion(); | ||
| const installMethod = getDetectedInstallMethod(); | ||
| const policy = state.latest | ||
| ? evaluatePolicy({installMethod, tier: settings.updates.tier, current, latest: state.latest.version}) | ||
| : null; | ||
| res.json({ | ||
| currentVersion: current, | ||
| latest: state.latest, | ||
| lastCheckAt: state.lastCheckAt, | ||
| installMethod, | ||
| tier: settings.updates.tier, | ||
| policy, | ||
| vulnerableBelow: state.vulnerableBelow, | ||
| }); |
There was a problem hiding this comment.
3. Admin status leaks version 🐞 Bug ⛨ Security
GET /admin/update/status is not guaranteed to be admin-only because webaccess only treats /admin-auth* as “admin” and the default requireAuthentication=false allows the request through, leaking currentVersion publicly. Even with authentication enabled, any authenticated non-admin can hit the endpoint because requireAdmin is false for this path.
Agent Prompt
### Issue description
`/admin/update/status` currently relies on a comment assumption (“webaccess gates /admin/*”) that is not true with the current `webaccess.ts` logic. As a result, the endpoint can be reachable by unauthenticated users when `requireAuthentication=false` (default) and by non-admin authenticated users when `requireAuthentication=true`, leaking `currentVersion`.
### Issue Context
- `webaccess.ts` uses `requireAdmin = req.path.toLowerCase().startsWith('/admin-auth')`, so `/admin/update/status` is not treated as admin-only.
- The endpoint returns sensitive-ish info (`currentVersion`) that the PR explicitly tries not to expose publicly.
### Fix Focus Areas
- src/node/hooks/express/updateStatus.ts[29-60]
- src/node/hooks/express/webaccess.ts[58-60]
- src/node/hooks/express/webaccess.ts[127-134]
- src/tests/backend/specs/updateStatus.ts[76-87]
### Suggested fix
- In the `/admin/update/status` handler, require `req.session?.user?.is_admin === true`.
- If no session user: respond `401`.
- If session user exists but not admin: respond `403`.
- Add/extend integration tests to cover:
- default `requireAuthentication=false` still denies access
- authenticated non-admin user denied
- admin user allowed
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| app.get('/api/version-status', async (_req: any, res: any) => { | ||
| const now = Date.now(); | ||
| if (now - badgeCache.at > BADGE_CACHE_MS) { | ||
| badgeCache = {value: await computeOutdated(), at: now}; | ||
| } | ||
| res.json({outdated: badgeCache.value}); | ||
| }); |
There was a problem hiding this comment.
4. Async handlers unprotected 🐞 Bug ☼ Reliability
The new Express routes use async handlers directly, so exceptions (for example, loadState() throwing on EPERM/IO errors) will not be forwarded to Express error handling in this codebase’s established pattern. This can cause unhandled promise rejections and/or hung requests.
Agent Prompt
### Issue description
`updateStatus.ts` registers `async` Express handlers without wrapping them to forward rejections to `next(err)`. In this repo, other handlers explicitly `.catch(next)`, implying Express is not expected to handle async rejections automatically.
### Issue Context
`loadState()` can throw for non-ENOENT read failures (permissions, IO), and those errors would currently bypass Express error handling.
### Fix Focus Areas
- src/node/hooks/express/updateStatus.ts[29-60]
- src/node/hooks/express/importexport.ts[27-71]
### Suggested fix
- Change both routes to the established pattern:
- `app.get(path, (req, res, next) => { (async () => { ... })().catch((err) => next(err || new Error(err))); });`
- Optionally add a small helper (e.g. `wrapAsync`) local to the file to reduce duplication.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| /** | ||
| * Self-update subsystem (PR 1: tier 1 only). | ||
| * Tier "off" disables the version check entirely. Default "notify" shows a banner when behind. | ||
| */ | ||
| updates: { | ||
| tier: 'notify', | ||
| source: 'github', | ||
| channel: 'stable', | ||
| installMethod: 'auto', | ||
| checkIntervalHours: 6, | ||
| githubRepo: 'ether/etherpad', | ||
| }, |
There was a problem hiding this comment.
1. updates.tier defaults to notify 📘 Rule violation ☼ Reliability
The new updater subsystem is enabled by default because updates.tier defaults to notify, which starts background polling on boot. This violates the requirement that new features be feature-flagged and disabled by default to avoid changing behavior for existing deployments.
Agent Prompt
## Issue description
The updater subsystem is a new feature but is enabled by default because `updates.tier` defaults to `notify`, which starts polling on boot.
## Issue Context
Compliance requires new features to be behind a feature flag and disabled by default. The code currently supports disabling via `updates.tier: "off"`, but the default value is `notify` in both runtime defaults and the settings templates.
## Fix Focus Areas
- src/node/utils/Settings.ts[427-438]
- settings.json.template[178-190]
- settings.json.docker[187-200]
- src/node/updater/index.ts[108-116]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| // Admin-protected. webaccess.ts already gates /admin/* with admin auth. | ||
| app.get('/admin/update/status', async (_req: any, res: any) => { | ||
| const state = await loadState(stateFilePath()); | ||
| const current = getEpVersion(); | ||
| const installMethod = getDetectedInstallMethod(); | ||
| const policy = state.latest | ||
| ? evaluatePolicy({installMethod, tier: settings.updates.tier, current, latest: state.latest.version}) | ||
| : null; | ||
| res.json({ | ||
| currentVersion: current, | ||
| latest: state.latest, | ||
| lastCheckAt: state.lastCheckAt, | ||
| installMethod, | ||
| tier: settings.updates.tier, | ||
| policy, | ||
| vulnerableBelow: state.vulnerableBelow, | ||
| }); | ||
| }); |
There was a problem hiding this comment.
2. Admin status endpoint public 🐞 Bug ⛨ Security
GET /admin/update/status does not verify req.session.user.is_admin, and webaccess treats only /admin-auth* as admin-protected, so with default requireAuthentication=false the endpoint can be accessed unauthenticated and leaks currentVersion, latest, and vulnerableBelow. The included test only forces auth by flipping requireAuthentication=true, which masks the default behavior.
Agent Prompt
### Issue description
`/admin/update/status` is currently reachable without admin authentication under the default configuration (because webaccess only treats `/admin-auth*` as admin-protected and `requireAuthentication` defaults to false). This leaks operational/security-relevant information.
### Issue Context
Admin auth in this codebase is triggered by requesting `/admin-auth/*` (see `webaccess.ts`). New admin-only JSON endpoints should either:
- live under `/admin-auth/...` so the global middleware enforces admin auth, and/or
- explicitly check `req.session.user?.is_admin` and reject otherwise.
### Fix Focus Areas
- src/node/hooks/express/updateStatus.ts[43-60]
- admin/src/components/UpdateBanner.tsx[11-16]
- src/tests/backend/specs/updateStatus.ts[76-87]
### Expected change
1. Move the endpoint to `/admin-auth/update/status` **or** add an explicit admin check in the handler (prefer doing both for defense-in-depth).
2. Update the admin UI fetch URL accordingly.
3. Update/add an integration test that asserts the endpoint is unauthorized when `requireAuthentication` is left at its default (false) and no admin session is present.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| const startPolling = (): void => { | ||
| const intervalMs = Math.max(1, settings.updates.checkIntervalHours) * 60 * 60 * 1000; | ||
| if (timer) clearInterval(timer); | ||
| timer = setInterval(() => { void performCheck(); }, intervalMs); | ||
| // Run an immediate first check, but don't block boot. | ||
| setTimeout(() => { void performCheck(); }, 5000); | ||
| }; |
There was a problem hiding this comment.
3. Updater checks can overlap 🐞 Bug ☼ Reliability
startPolling() schedules performCheck() via both setInterval and an initial setTimeout without any in-flight guard, so slow GitHub fetches can cause concurrent performCheck() executions that mutate the shared cached state and race on writing update-state.json. shutdown() clears only the interval, not the initial timeout, so a check can still fire after shutdown has begun.
Agent Prompt
### Issue description
The updater poll loop can execute multiple `performCheck()` runs concurrently (interval tick + initial timeout, plus any later overlap if a check runs longer than the interval). Because `performCheck()` mutates shared in-memory state and writes it to disk, concurrent runs can cause state corruption/out-of-order writes and duplicate email decisions.
### Issue Context
`performCheck()` uses `getCurrentState()` which caches a singleton `inMemoryState`, then mutates it and calls `saveState()`.
### Fix Focus Areas
- src/node/updater/index.ts[36-106]
- src/node/updater/index.ts[118-121]
### Expected change
1. Add an in-flight guard (e.g., `let checkInFlight = false;`) so overlapping ticks return early or coalesce.
2. Store the `setTimeout()` handle and clear it in `shutdown()`.
3. Consider changing to a self-scheduling loop (`setTimeout` scheduled *after* an awaited `performCheck()` completes) to guarantee serialization.
4. (Optional) Add a unit/integration test that simulates a slow fetcher to ensure `performCheck()` is not entered concurrently.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
|
/review |
|
Persistent review updated to latest commit 61e6d6d |
| const performCheck = async (): Promise<void> => { | ||
| if (settings.updates.tier === 'off') return; | ||
| // Coalesce overlapping ticks. performCheck mutates shared in-memory state and writes | ||
| // it to disk; concurrent runs would race on saveState() and could double-send emails. | ||
| if (checkInFlight) return; | ||
| checkInFlight = true; | ||
| const state = await getCurrentState(); | ||
| try { |
There was a problem hiding this comment.
1. Updater poller can get stuck 🐞 Bug ☼ Reliability
performCheck() sets checkInFlight=true then awaits getCurrentState() outside the try/finally, so if loadState() throws (e.g., EACCES) the promise rejects, checkInFlight is never reset, and future polling ticks are permanently skipped (plus an unhandled rejection risk due to `void performCheck()`).
Agent Prompt
## Issue description
`performCheck()` can leave the updater permanently disabled if `getCurrentState()` throws before entering the `try/finally` that resets `checkInFlight`. This can also surface as an unhandled promise rejection because polling invokes `void performCheck()`.
## Issue Context
`loadState()` rethrows non-ENOENT fs errors (e.g., permission issues). If that happens during `getCurrentState()`, the `finally { checkInFlight = false }` never runs.
## Fix Focus Areas
- src/node/updater/index.ts[38-106]
- src/node/updater/index.ts[108-116]
## Implementation notes
- Move `const state = await getCurrentState()` inside the `try` block (or wrap the entire body in `try { ... } finally { ... }`).
- Optionally harden the timer callbacks to avoid unhandled rejections: `void performCheck().catch(err => logger.warn(...))` (even if performCheck should no longer reject).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| export const UpdatePage = () => { | ||
| const {t} = useTranslation(); | ||
| const us = useStore((s) => s.updateStatus); | ||
|
|
||
| if (!us) return <div>{t('admin.loading', {defaultValue: 'Loading...'})}</div>; | ||
|
|
||
| const upToDate = !us.latest || us.currentVersion === us.latest.version; |
There was a problem hiding this comment.
2. Updates off hangs admin page 🐞 Bug ≡ Correctness
If the status endpoint is missing or returns non-OK (notably when updates.tier='off' disables route registration), the admin UpdatePage never initiates a fetch and stays stuck on "Loading..." indefinitely.
Agent Prompt
## Issue description
The admin Update page can become unusable (permanent "Loading...") if `/admin/update/status` is not available or not OK (404 when `updates.tier === 'off'`, 401/403 when gated, network errors).
## Issue Context
- Server disables updater HTTP surface when tier is off.
- UI relies on UpdateBanner's background fetch to populate global store and does nothing on non-OK responses.
## Fix Focus Areas
- admin/src/pages/UpdatePage.tsx[4-12]
- admin/src/components/UpdateBanner.tsx[11-18]
- admin/src/App.tsx[103-118]
- src/node/hooks/express/updateStatus.ts[36-78]
## Implementation notes
Choose one coherent approach:
1) **UI-driven resilience (recommended):**
- Make `UpdatePage` fetch `/admin/update/status` itself (and set store) and render explicit states for 404 (updates disabled), 401/403 (not authorized), and generic errors.
- Update `UpdateBanner` to either (a) set an explicit error/disabled state on non-OK, or (b) keep it best-effort but ensure `/update` page is self-sufficient.
2) **Server-driven contract:**
- Keep the route registered even when tier is off, but return a minimal payload indicating `tier: 'off'` (and perhaps `policy: null`) so the UI can render a disabled message without hanging.
Also consider hiding the Update nav link once the UI knows updates are disabled (or not authorized).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
61e6d6d to
110bddd
Compare
|
/review |
Code Review by Qodo
1. Badge bypasses pad baseURL
|
| export const renderVersionBadge = async (): Promise<void> => { | ||
| const el = document.getElementById('version-badge'); | ||
| if (!el) return; | ||
| try { | ||
| const res = await fetch('/api/version-status', {credentials: 'same-origin'}); | ||
| if (!res.ok) return; | ||
| const data = (await res.json()) as BadgeResponse; | ||
| if (!data.outdated) { el.style.display = 'none'; return; } |
There was a problem hiding this comment.
1. Badge bypasses pad baseurl 🐞 Bug ≡ Correctness
The pad footer badge fetches '/api/version-status' using an origin-root path, but the pad runtime
computes and uses pad.baseURL (derived from new URL('..', window.location.href).pathname) to
support being hosted under a subpath; this request will miss the prefix and 404 in such deployments.
As a result, the badge silently never appears even when the server reports severe/vulnerable.
Agent Prompt
### Issue description
`pad_version_badge.ts` calls `/api/version-status` with an origin-root path, which breaks deployments where Etherpad is served under a subpath (because other pad requests are prefixed by `pad.baseURL`).
### Issue Context
`padBootstrap.js` derives `basePath` from the current URL and assigns it to `pad.baseURL`. Other pad code uses `${exports.baseURL}...` for server calls.
### Fix Focus Areas
- src/static/js/pad_version_badge.ts[14-22]
- src/templates/padBootstrap.js[16-21]
### Suggested fix
Build the request URL with the same basePath logic used by the pad bootstrap, e.g.:
- `const basePath = new URL('..', window.location.href).pathname;`
- `fetch(`${basePath}api/version-status`, ...)`
Avoid importing `./pad` from `pad_version_badge.ts` to prevent circular module initialization issues (it is already imported by `pad.ts`).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Round 2 of Qodo review on ether#7601. Addressing the action-required items: #1 Badge bypassed pad baseURL — derive basePath the same way padBootstrap.js does (`new URL('..', window.location.href).pathname`) and prefix the fetch with it. Subpath deployments now reach /<prefix>/api/version-status instead of 404ing. #2 Updater poller could get stuck — `getCurrentState()` is now inside the try/finally so a one-time loadState() rejection can't leave `checkInFlight=true` and permanently silence polling. #3 Updates off hung admin page — UpdatePage now self-fetches and renders explicit `disabled` (404), `unauthorized` (401/403), and `error` states instead of staying on "Loading...". Banner-driven prefetch is still honoured if it landed first. ether#11 NaN polling interval — coerce `checkIntervalHours` to a number, clamp to [1h, 168h], log a warning and fall back to 6h on non-finite input. Math.max(1, NaN) === NaN previously meant a malformed settings.json could turn the poller into a tight loop. ether#13 State validation accepted broken subfields — `isValid()` now inspects `latest.{version,tag,body,publishedAt,htmlUrl,prerelease}`, `vulnerableBelow[].{announcedBy,threshold}`, and `email.{severeAt,vulnerableAt,vulnerableNewReleaseTag}`. A hand-edited file with a number where a string is expected is now treated as corrupt and reset to EMPTY_STATE rather than crashing later in semver parsing or email rendering. ether#14 Badge cache stampede — wrap `computeOutdated()` in a single-flight promise so concurrent requests at cache expiry await one shared computation instead of fanning out into N redundant disk reads. Plus six new state.test.ts cases covering each new validation guard. Pushing back on the remaining items: ether#4 `updates.tier` defaults to `notify` — intentional. The whole point of tier 1 is to surface the "you are behind" signal to admins by default. Opt-in defeats the purpose; the existing failure mode (admin never hears about a security-relevant release) is exactly what this PR is fixing. ether#5/ether#8 Admin status endpoint admin-auth — `currentVersion` is already public via `/health`, so wrapping the route in admin-auth doesn't reduce the disclosure surface meaningfully. Operators who want it gated set `updates.requireAdminForStatus=true` (already wired and covered by the comment on the route handler). ether#10 Plain `https://` URLs in planning doc — planning markdown is viewed in editors and on GitHub where protocol-relative URLs would either render literally or break entirely. Keeping `https://`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/review |
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
|
Persistent review updated to latest commit 928f5b2 |
|
/review |
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
|
Persistent review updated to latest commit 928f5b2 |
|
|
||
| useEffect(() => { | ||
| let cancelled = false; | ||
| fetch('/admin/update/status', {credentials: 'same-origin'}) |
There was a problem hiding this comment.
Best would be to use tanstack query. If you use fetch in a call it is mostly always wrong see https://react.dev/reference/react/useEffect#fetching-data-with-effects
| return () => { cancelled = true; }; | ||
| }, [setUpdateStatus]); | ||
|
|
||
| if (!updateStatus || !updateStatus.latest) return null; |
There was a problem hiding this comment.
If you useSuspense Query you can save the manual return null and show with suspense boundary that the ui is loading e.g. with a loading spinner
| // Self-fetch so the page renders an explicit state even if UpdateBanner's | ||
| // best-effort fetch never landed (route returns 404 when tier=off, 401/403 | ||
| // if requireAdminForStatus is set, or a transient network error). | ||
| const [fetchState, setFetchState] = useState<FetchState>(us ? {kind: 'ok'} : {kind: 'loading'}); |
SamTV12345
left a comment
There was a problem hiding this comment.
I'd like to get rid of the manual data fetching. It is a lot of code and it seems to be the wrong one. Please use tanstack query. If you have the rest api in the swagger you can refer to openapi-typescript and openapi-react-query with openapi-fetch. It generates a typesafe client with basically no boilerplate.
|
Okidoki will do |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
typeof null === 'object' meant {email:null} passed the old isValid check,
which would crash downstream Notifier code reading email.severeAt. Likewise,
an array would pass the typeof latest === 'object' branch. Introduce
isPlainObject helper (null-safe, Array.isArray guard) and use it for both
fields. Adds two regression tests covering the exact broken inputs.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Register expressCreateServer/shutdown hooks in ep.json and implement the boot-wiring module that detects install method, starts the polling interval and runs the notifier dedupe pass each tick. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ints Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add UpdateStatusPayload to the zustand store, a persistent UpdateBanner rendered in the App layout, a /update page showing version details and changelog, and a Bell nav link — all wired to the /admin/update/status endpoint added in Task 10. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ling - Export stateFilePath from index.ts and import it in updateStatus.ts (removes local duplicate) - Import getEpVersion from Settings.ts in both index.ts and updateStatus.ts (removes two local definitions) - Fix misleading 'backing off' log message — no backoff is implemented, just retries at next interval - Remove EMPTY_STATE_FOR_TESTS re-export from state.ts; state.test.ts now imports EMPTY_STATE directly from types.ts - Add .update-banner and .update-page CSS rules to admin/src/index.css Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ll race, opt-in admin gate - Wrap /api/version-status and /admin/update/status with a small async helper so a rejected promise becomes next(err) instead of an unhandled rejection. - Short-circuit route registration when updates.tier === 'off' so the heavier opt-out also removes the HTTP surface (matches pre-PR behavior for that case). - Add an in-flight guard around performCheck() so overlapping interval ticks can't race on update-state.json writes or duplicate email decisions; track the initial setTimeout handle and clear it in shutdown(). - Add updates.requireAdminForStatus (default false) so admins can lock /admin/update/status to authenticated admin sessions without disabling the updater. Default false preserves current behavior (the running version is already exposed publicly via /health). Backend specs cover unauth → 401, non-admin → 403, admin → 200. - Bump admin troubleshooting menu count test 5 → 6 to account for the new Update nav link. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round 2 of Qodo review on ether#7601. Addressing the action-required items: #1 Badge bypassed pad baseURL — derive basePath the same way padBootstrap.js does (`new URL('..', window.location.href).pathname`) and prefix the fetch with it. Subpath deployments now reach /<prefix>/api/version-status instead of 404ing. #2 Updater poller could get stuck — `getCurrentState()` is now inside the try/finally so a one-time loadState() rejection can't leave `checkInFlight=true` and permanently silence polling. #3 Updates off hung admin page — UpdatePage now self-fetches and renders explicit `disabled` (404), `unauthorized` (401/403), and `error` states instead of staying on "Loading...". Banner-driven prefetch is still honoured if it landed first. ether#11 NaN polling interval — coerce `checkIntervalHours` to a number, clamp to [1h, 168h], log a warning and fall back to 6h on non-finite input. Math.max(1, NaN) === NaN previously meant a malformed settings.json could turn the poller into a tight loop. ether#13 State validation accepted broken subfields — `isValid()` now inspects `latest.{version,tag,body,publishedAt,htmlUrl,prerelease}`, `vulnerableBelow[].{announcedBy,threshold}`, and `email.{severeAt,vulnerableAt,vulnerableNewReleaseTag}`. A hand-edited file with a number where a string is expected is now treated as corrupt and reset to EMPTY_STATE rather than crashing later in semver parsing or email rendering. ether#14 Badge cache stampede — wrap `computeOutdated()` in a single-flight promise so concurrent requests at cache expiry await one shared computation instead of fanning out into N redundant disk reads. Plus six new state.test.ts cases covering each new validation guard. Pushing back on the remaining items: ether#4 `updates.tier` defaults to `notify` — intentional. The whole point of tier 1 is to surface the "you are behind" signal to admins by default. Opt-in defeats the purpose; the existing failure mode (admin never hears about a security-relevant release) is exactly what this PR is fixing. ether#5/ether#8 Admin status endpoint admin-auth — `currentVersion` is already public via `/health`, so wrapping the route in admin-auth doesn't reduce the disclosure surface meaningfully. Operators who want it gated set `updates.requireAdminForStatus=true` (already wired and covered by the comment on the route handler). ether#10 Plain `https://` URLs in planning doc — planning markdown is viewed in editors and on GitHub where protocol-relative URLs would either render literally or break entirely. Keeping `https://`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
928f5b2 to
774fca7
Compare
|
/review |
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
|
Persistent review updated to latest commit 774fca7 |
#7704) * docs(updater): PR 2 (Tier 2 manual-click) implementation plan 20-task TDD plan for shipping the manual-click update flow on top of the Tier 1 (notify) work merged in #7601. Covers UpdateExecutor, RollbackHandler, SessionDrainer, lock + trustedKeys, four admin endpoints (apply / cancel / acknowledge / log), admin UI updates, integration tests against a tmp git repo, and a manual smoke runbook for the spec's "before each tier ships" gate. Plan deliberately scopes signature verification to an opt-in stub (updates.requireSignature: false default) to avoid blocking on a separate release-signing project. Plan: docs/superpowers/plans/2026-05-08-auto-update-pr2-manual-click.md Spec: docs/superpowers/specs/2026-04-25-auto-update-design.md Issue: #7607 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): extend state + settings for Tier 2 manual-click Adds ExecutionStatus discriminated union, bootCount, and lastResult to UpdateState, plus the preApplyGraceMinutes/drainSeconds/diskSpaceMinMB/ requireSignature/trustedKeysPath knobs that Tier 2's executor needs. loadState backfills the new fields on Tier 1 state files so existing installs keep working. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): PID-based update.lock with stale-pid reaping Single-flight guard for Tier 2's UpdateExecutor. Atomic O_CREAT|O_EXCL acquire; on EEXIST, sends signal 0 to the recorded PID and reaps if dead. Unparseable / partially-written lock files are treated as stale rather than fatal so a half-written lock from a SIGKILL'd parent doesn't lock the install out forever. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): verifyReleaseTag — gpg-via-git stub for Tier 2 preflight Default updates.requireSignature=false: log a warning and return ok with reason=signature-not-required. Set true to make preflight refuse a tag whose signature does not verify under the system keyring (or trustedKeysPath via GNUPGHOME). Etherpad's release process does not yet sign tags consistently; turning the check on by default would break Tier 2 for every admin and forcing a release-signing change is out of scope for this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): preflight check pipeline for Tier 2 Pure orchestrator over injected probes for install-method, working tree, disk space, pnpm presence, lock state, remote tag existence and signature verification. Cheap-and-definitive checks run first; first failure short-circuits with a typed reason that the route layer will surface in the preflight-failed admin banner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): rolling update.log helpers (appendLine + tailLines) Direct file-append + size-based rotation rather than a log4js appender — avoids re-configuring log4js on top of the user's existing logconfig. appendLine creates parents, rotates at 10MB (configurable), keeps 5 backups by default. tailLines reads the last N lines for /admin/update/log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): SessionDrainer + handshake guard Drainer schedules T-60 / -30 / -10 broadcasts and resolves at T=0; isAcceptingConnections() flips off for the duration. PadMessageHandler consults the flag at the start of CLIENT_READY and disconnects new joiners with reason "updateInProgress" — existing sockets are unaffected. Drains shorter than 30s collapse the early timers to fire ASAP rather than queue past the drain end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): UpdateExecutor — snapshot, fetch/checkout/install/build, exit 75 Pure-DI orchestrator: spawnFn, copyFile, readSha, saveState, exit are all injected so unit tests run the full pipeline without spawning real children or mutating the real install. Streams stdout/stderr to update.log via the now-best-effort appendLine helper (swallows fs errors so the executor itself never breaks on read-only / unwritable log dirs). Failure paths transition to rolling-back and return — the route layer hands off to RollbackHandler which owns the rollback exit, so we don't double-exit and lose tail lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): RollbackHandler — health-check timer + crash-loop guard checkPendingVerification arms a 60s timer at boot when state is pending-verification and increments bootCount; bootCount>2 forces an immediate rollback (crash-loop guard). markVerified persists the verified state and stops the timer. performRollback restores the backup lockfile, runs git checkout <fromSha> and pnpm install, lands on rolled-back or rollback-failed (terminal) on sub-step failure, exits 75 either way so the supervisor restart brings the new state up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): wire RollbackHandler into boot + UpdatePolicy honours rollback-failed - expressCreateServer now invokes checkPendingVerification before polling starts so a previous boot's pending-verification either re-arms the health-check timer or, when bootCount has climbed past the crash-loop threshold, forces an immediate rollback. - server.ts calls markBootHealthy after state hits RUNNING so /health-being-up is the implicit happy-path signal that cancels the rollback timer. - /admin/update/status surfaces execution + lastResult + lockHeld so the admin UI can render the right Apply / Cancel / Acknowledge state. - UpdatePolicy gains an `executionStatus` input. While it equals 'rollback-failed', canAuto / canAutonomous are denied (reason: rollback-failed-terminal); manual stays on because clicking Apply IS the intervention the terminal state needs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): apply / cancel / acknowledge / log endpoints Strict admin-only POSTs that drive Tier 2's manual-click flow: - POST /admin/update/apply: acquire lock, persist preflight, run preflight, drain $drainSeconds, executeUpdate (which exits 75 on success), or run performRollback on a failure path (also exits 75). - POST /admin/update/cancel: cancel a pre-execute drain/preflight, write cancelled lastResult, release lock. - POST /admin/update/acknowledge: clear terminal states (preflight-failed, rolled-back, rollback-failed) back to idle. lastResult is preserved so the admin still sees what happened. - GET /admin/update/log: tail var/log/update.log (200 lines) for the in- progress UI. Strict admin auth. Also: - socketio hook exports getIo() so the apply endpoint can broadcast the drain shoutMessage outside the regular hook surface. - ep.json registers updateActions after admin/updateStatus. - 11 mocha integration tests cover auth, policy denial, execution-busy, acknowledge-clears-terminal, log content-type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): admin UI Apply/Cancel/Acknowledge + live log stream UpdatePage renders the right action set based on execution.status: Apply when idle/verified and policy allows, Cancel during preflight/draining, Acknowledge on terminal preflight-failed / rolled-back / rollback-failed. While the executor is in flight (preflight/draining/executing/rolling-back) the page polls /admin/update/log + /admin/update/status once a second and shows the rolling tail; polling stops automatically when the run terminates. lastResult and policy denial reasons surface localised copy. Buttons disable themselves while a network round-trip is in flight to dodge double-clicks. New i18n keys live under update.page.{apply,cancel, acknowledge,log,execution,policy.*,last_result.*}, update.execution.*, update.banner.terminal.rollback-failed, and update.drain.{t60,t30,t10}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): pad shoutMessage renders update.drain.* via html10n broadcastShout now sends {messageKey, values, sticky} so the existing pad-side shout pipeline can route through html10n.get(). The renderer gains a values pass-through so update.drain.t60 etc. interpolate {{seconds}}, and gives updater shouts a different gritter title (the banner.title localised string) so users know it's a system event rather than a generic admin message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): rollback uses git checkout -f + integration suite over tmp git repo RollbackHandler now does git checkout -f <fromSha> BEFORE overlaying the backup lockfile. Without -f, git refuses checkout when there are unstaged modifications to files it would overwrite — exactly the case after a partial executor run that mutated the working tree. With -f the partial mutation is discarded and the working tree returns to fromSha cleanly. The backup-lockfile copy is still done (belt-and-braces) but tolerates ENOENT since checkout already restored the right lockfile. The new integration suite at src/tests/backend/specs/updater-integration.ts exercises the full pipeline against a disposable git repo: happy path, install-fail rollback, build-fail rollback, crash-loop guard, and a target-sha-doesn't-exist rollback-failed terminal case. 5 mocha tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(updater): Playwright admin Apply / Cancel / Acknowledge flow Stubs /admin/update/status (and /admin/update/apply for the apply path) at the route level so we can assert UI transitions without actually running an update. Four scenarios: - Apply button POSTs and re-fetches status (>=2 status fetches total). - install-method-not-writable hides the button and shows localised denial copy. - rollback-failed terminal state shows the Acknowledge button and the "Manual intervention required" lastResult copy. - lockHeld=true hides Apply even when policy.canManual is on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(updater): admin banner shows rollback-failed terminal alert When execution.status === 'rollback-failed' the banner switches to a role=alert with the strong update.banner.terminal.rollback-failed copy and overrides the regular "update available" framing — an admin who left the system in this state needs to fix it before any other admin work matters. Other terminal states (preflight-failed, rolled-back) are informational and surface on the page itself, not the banner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(updater): Tier 2 admin docs + manual smoke runbook + CHANGELOG doc/admin/updates.md gains a full Tier 2 section: prerequisites (git install + process supervisor with sample systemd unit), Apply flow with timings, every failure mode and the resulting state, the four endpoints, and the signature-verification opt-in. Settings table picks up the new updates.* knobs. docs/superpowers/specs/2026-04-25-auto-update-runbook.md is the manual smoke runbook the design spec calls for: disposable VM, systemd unit, every observable transition (happy path, install/ build-fail rollback, crash-loop guard, rollback-failed terminal, cancel during drain) plus a sign-off checklist for the release cut. CHANGELOG Unreleased section explains the supervisor requirement and points readers at the runbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(updater): note docker-friendly update flows as follow-up work Tier 2 refuses Apply on installMethod=docker because in-container mutation doesn't survive a container restart. Adds a future-work note covering the two reasonable paths for an in-product docker Apply button (instructions-only vs deploy-webhook) and explicitly rules out mounting /var/run/docker.sock as a footgun. Watchtower gets a pointer for admins who want fully autonomous docker updates today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(updater): address Qodo review (1-6) + Playwright strict-mode CI fix 1. Tier 2 endpoints now gate on tier in {manual, auto, autonomous} — notify and off return 404 to match the prior PR-1 behaviour. Gate is evaluated per-request via app.use middleware so a settings.json reload takes effect without a full restart, and so integration tests can flip the tier dynamically. Adds a regression test that exercises 404 at tier=notify across all four endpoints. 2. cancel/apply race fixed: /admin/update/cancel no longer releases the lock — apply's finally block owns it for the request's lifetime. Apply now reloads state after preflight and aborts with 409 cancelled-during- preflight if execution.status is no longer 'preflight' for the same targetTag. Prevents a second apply from sneaking in while the first is still running its slow checks, and prevents the post-cancel apply from continuing into drain/execute. 3. SessionDrainer now restores acceptingConnections=true at drain completion (not just on cancel). The lock + persisted execution.status prevent a fresh apply from racing in — the in-memory flag was redundant safety that turned into a wedge if the executor threw post-drain. Adds a unit test asserting the flag is restored after natural drain end. 4. PadMessageHandler drain guard switched from socket.json.send (a socket.io v2/v3 API that may not exist on v4) to socket.emit('message', ...) for consistency with the other disconnect paths in the file. 5. Spawn 'error' handlers added to runStep helpers in UpdateExecutor and RollbackHandler, plus the gpg verify-tag spawn in trustedKeys. Without them, a missing/unexecutable binary leaves the promise hanging forever and the update flow stuck in-flight. SpawnFn type extended to allow on('error', ...) listeners cleanly. Spawn errors now resolve with code 1 + the error message in stderr, so the existing failure-detection branches fire normally. 6. executeUpdate body wrapped in try/catch. An exception from readSha, saveState, copyFile, or any step now lands in a rolling-back persist + returns failed-checkout, so the route's post-executor rollback path picks it up. State can no longer wedge at 'executing'. The catch's inner saveState is itself try/wrapped so a write-after-write failure doesn't crash the route either. CI: Playwright update-page-actions strict-mode violation fixed. Both the banner and the lastResult <p> contain "Manual intervention required"; selector now scopes to p.last-result-rollback-failed for the lastResult assertion specifically. 129 vitest unit tests + 23 mocha integration tests passing; ts-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(updater): address Qodo #7 (status leak) + #8 (short-drain values) #7. /admin/update/status now redacts diagnostic strings for unauth callers even when requireAdminForStatus is left at its default (false). Status enum + outcome enum are kept (the admin banner / pad-side badge need them to render the right UI) but execution.reason / execution.fromSha / execution.targetTag and the same fields on lastResult are stripped. Authed admin sessions still get the full payload — they're looking at their own server's diagnostics. Two new mocha tests cover both paths: "redacts execution.reason / lastResult.reason for unauth callers" and "returns full diagnostic payload to authed admin sessions". #8. SessionDrainer no longer schedules T-30 / T-10 broadcasts when the configured drainSeconds can't honour them. Previously, with drainSeconds < 30 the T-30 timer fired at zero remaining but the broadcast still claimed "30 seconds" — misleading. Now T-30 only schedules when drainSeconds > 30 and T-10 only when > 10. Admins picking a short drain get fewer announcements but each carries an accurate countdown. The opening announcement now reports the configured drain length rather than a hardcoded 60. Two updated unit tests: drainSeconds=15 (skips T-30, still fires T-10) and drainSeconds=5 (skips both). 131 vitest unit + 26 mocha integration tests passing; ts-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(updater): address Qodo follow-up — tag injection, rollback rejections, state validation Qodo posted three new concerns after the first fix push. 1. Git tag option injection (security). The release tag from GitHub's tag_name flowed into `git checkout` / `git verify-tag` as a positional arg. A tag starting with '-' would be parsed as an option and could bypass signature verification or change checkout semantics. Mitigated in three layers: - New refSafety helper (isValidTag / assertValidTag / refsTagsForm) enforces a strict subset of git's check-ref-format spec: rejects leading '-' or '.', whitespace, control chars, and ~ ^ : ? * [ \\ and the '..' sequence. - VersionChecker validates tag_name before persisting to state, so a malformed value from a misconfigured githubRepo never lands on disk. - UpdateExecutor calls assertValidTag and uses the refs/tags/<tag> form for git checkout. trustedKeys also validates and adds '--' to git verify-tag for an end-of-options marker. updateActions does an up-front isValidTag check on state.latest.tag so a corrupt state file gets a clean 409 instead of a 500. 2. Unhandled rollback rejections. checkPendingVerification was firing `void deps.saveState(...)` and `void performRollback(...)` without .catch(), so an fs error during boot's rollback path would bubble out as an unhandled rejection. Both callsites now go through fireSaveState / fireRollback helpers that catch and log; rollback rejections fall through to a best-effort terminal-state write + exit 75 so the supervisor can re-try the next boot with bootCount++. 3. Execution state under-validated. isValidExecution previously checked only that `status` was a known enum value, so a hand-edited state file with `{execution: {status: 'pending-verification'}}` (missing fromSha / targetTag / deadlineAt) would pass validation and reach RollbackHandler with undefined refs. The validator now consults a per-status required-fields map mirroring the ExecutionStatus union in types.ts and rejects empty strings as well as missing fields. Same tightening applied to lastResult.outcome (must be in the allowed enum, not just any string). Six new unit tests cover hand-edited corruption. 145 vitest + 26 mocha tests green; ts-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Ships Tier 1 (notify) of the four-tier auto-update design at
docs/superpowers/specs/2026-04-25-auto-update-design.md.var/update-state.json./admin/updatepage (read-only) with version, install method, tier, changelog./api/version-statusnever leaks the running version string.adminEmailsetting (weekly while vulnerable, monthly while severely outdated). Real SMTP transport deferred to a follow-up; PR 1 ships the cadence/dedupe machinery.updates.*settings block with a default oftier: "notify". Settier: "off"to disable entirely.See
doc/admin/updates.mdfor full configuration.Architecture
src/node/updater/— pure modules:types,versionCompare,state(atomic write + schema validation),InstallMethodDetector,UpdatePolicy,VersionChecker(fetcher abstraction),Notifier(cadence decider) + boot wiring (index.ts).src/node/hooks/express/updateStatus.ts— public/api/version-status(cached 60s, no version leak) and admin/admin/update/status.src/node/utils/Settings.ts+settings.json.template+settings.json.docker.admin/src/components/UpdateBanner.tsx,admin/src/pages/UpdatePage.tsx, store slice, route, nav link, CSS.src/static/js/pad_version_badge.ts,src/templates/pad.html,src/static/css/pad.css.src/locales/en.json.doc/admin/updates.md(new),CHANGELOG.md(Unreleased section).Test plan
versionCompare,state,InstallMethodDetector,UpdatePolicy,VersionChecker,Notifier/admin/update/status(admin auth, shape) and/api/version-status(no version leak, severe state)null/severe/vulnerablepnpm ts-checkclean,pnpm run build:uisucceedsNotes
(would send email). Follow-up PR will add nodemailer or rely on a notification plugin.fix(updater)commits address bugs surfaced during code review (semver regex four-part rejection, null-email/array schema validation, ETag preservation on prerelease, tagChanged email cadence).🤖 Generated with Claude Code